AD- A 1 45  476  CONSTRAINED  OPTIMIZATION  USING  ITERATED  PARTIAL 

KUHN-TUCKER  VECTORS(U)  IOWA  UNI V  IOWA  CITY  DEPT  OF 
STATISTICS  AND  ACTURIAL  SCIENCE  R  L  DYKSTRA  ET  AL . 
UNCLASSIFIED  MAY  84  TR- 105  N00014-83-K-0249  F/G  12/1 


MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BURtAU  Of  STANDARDS  19t>i  A 


AD- A 145  476 


CONSTRAINED  OPTIMIZATION 
USING  ITERATED  PARTIAL  KUHN-TUCKER  VECTORS 


by 

Richard  L.  Dykstra 
and 

Peter  C.  Wollan 

Department  of  Statistics  and  Actuarial  Science 
The  University  of  Iowa 
Iowa  City,  IA  52242 


Technical  Report  #105 
May  1984 


This  research  was  partially  supported  by  ONR  Contract 


N00014-83-K-0249. 


84  09  06  053 


1 


ABSTRACT 

\ 

A  frequently  occurring  problem  is  that  of  minimising  a 
convex  function  subject  to  a  finite  set  of  inequality  con¬ 
straints.  Often  what  makes  this  problem  difficult  is  the 
sheer  number  of  constraints.  That  is,  we  could  solve  this 
problem  for  a  smaller  set  of  constraints,  but  solving  for  the 
total  set  causes  difficulty.  Here  we  discuss  an  approach 
which  uses  our  ability  to  solve  these  partial  problems  to 

lead  to  a  total  solution.  We~vrill  illustrate  the  method  with 

7-*  ’ 

several  examples  in  the  last  section  of  the  paper.  &ar~  >  r  \ 
approach  will  be  somewhat  heuristic  in  nature  to  promote  un¬ 
derstanding. 
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1.  INTRODUCTION. 


Let  us  consider  the  following  convex  programming  problem: 
Minimize  the  convex  function  f^(x)  (defined  on  some  subset  of 
Rn)  over  the  region  C  subject  to  the  constraints  f^(x)  £  0, 
•••,f  (x)  £  0  where  the  f .  ,  i=l,*-*,m  are  finite  convex 

functions  on  C.  We  could  of  course  define  fg(x)  to  be  +® 
for  x  ^  C,  and  hence  assume  that  our  functions  are  defined 
over  Rn.  We  shall  call  f^(x)  the  objective  function,  and 
refer  to  f^(x),  i=l,***,m  as  the  constraint  functions. 

We  shall  define  X  =  (X-,,'*’,X  )  €  Rm  to  be  a  vector  of 

1  m 

Kuhn-Tucker  coefficients  for  our  problem,  or  simply  a  Kuhn- 
Tucker  vector  if  X^  s  0,  i*l,***,m  and  if 

inf  fQ(x)  +  X1f1(x) +  ♦ • •  +  Xmfm(x) 
x€C 

is  finite  and  equal  to  the  optimal  value  of  our  original  prob¬ 
lem.  The  existence  of  Kuhn-Tucker  (KT)  vectors  is  guaranteed 
under  mild  conditions.  (See  Rockafellar  (1970)  for  a  discus¬ 
sion  of  this  material.)  Part  of  the  importance  of  KT  vectors 
stems  from  the  fact  that  they  can  convert  a  constrained  prob¬ 
lem  into  an  unconstrained  (or  at  least  more  simply  constrained) 
problem. 

Another  important  construct  is  the  Lagrangian  associated 
with  our  problem.  It  is  defined  as  the  function  L  on 


Rn‘  X  F\n  given  by 

f0(x)  +  X1f1 (x)  +  • • •  +  X  f  (x),  x  €  C,  X  »0,  i  =  L ,  ■  •  • 

-®,  x  C  7 ,  Xt  <  0  (31) 

+®,  x  £  i7. 

A  vector  pair  (X°,x°)  is  said  to  be  a  saddle-point  of  L  1 

(1.1)  L  ( X ,  x ' ' )  s  L  ( X  °  ,  x  C  )  £  I ,  ( X  " , x )  V  x  ,  X . 

Such  saddle-points  play  a  big  role  in  the  following  '’und  ament 
theorem  (Rockafellar,  p.  28l). 

0  p 

Theorem  1.1.  In  order  that  X  be  a  KT  vector  and  x  be 
an  optimal  solution,  it  is  necessary  and  sufficient  that 
(X  ,  x')  be  a  saddle-point  of  the  I  agr angian  1.  Me  «covr, 
this  c  jnditior;  tolls  iff  x'"  and  x  cat  lz"y 

(a)  X?  a  ),  f  (x°)  s  0  and  X?f .  (x°)  =  0,  i  =  1  ,  •  •  • 

(1.2)  1  An  11 

(b)  0  £  [  3  f  n  (  x  )  +  \  1  d  f ( x  )+•••+  X  d  f  (x  );. 

U  11  m  rr, 

n 

The  notation  df(x  )  indicates  the  set  or  subrradiontr 

of  f  at  xQ.  Condition  (b)  implies  that  x^  minimines 

fn(x)+X1f1(x)+  • • •  +  X  f  (x ) .  Of  course  if  the  f .  are  all 
u  i  i  mm  i 

differentiable,  (1.2) (b)  may  be  replaced  by 


n 


(1.2)  (b')  vfn(x°) +  X,Vf, (x°) +  • ■ •  +  X_Vf  (x°)  =  0. 

u  11  mm 

Given  that  one  can  solve  a  constrained  minimization  prob¬ 
lem,  condition  (1.2)(b')  can  often  be  used  to  find  a  KT  vec¬ 
tor  for  the  problem.  This  fact  shall  prove  useful  later  on. 


2.  THE  METHOD. 

The  basic  idea  behind  our  approach  is  that  one  can  reduce 
the  number  of  constraints  being  considered  by  modifying  the 
objective  function  through  the  use  of  estimated  KT  vectors. 
At  each  stage,  the  modified  problem  is  solved,  and  updated 
estimates  of  KT  vectors  are  found.  Under  fairly  general 
conditions,  the  solutions  to  the  modified  problems  must  con¬ 
verge  to  a  true  global  solution. 

To  be  more  specific,  we  assume  that  our  constraint  func¬ 
tions  are  grouped  into  vectors  and  given  by 

f-^x)  =  (f11(x),f12(x),«**,flm  (x) ) 

•  1 

fk(x)  =  (fWx)  ,fk2(x)  ’  '  "  ,rkmk(x^ ) 

k 

where  £  m.  =  m. 
i-1  1 

We  then  define  our  Lagrangian  as 
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L(X,x) 


■f0(x)  +  x'1f1(O+  •••  + VkU>,  x  e  c-  hj  *  °>  VlJ 
■  -»,  X  e  c,  xtJ  <  0,  3i,j 
+®,  x  i.  c. 


where  X  =  (X-L,---)Xk)  and  is  an  mi  x  1  vector- 

We  shall  use  L1(X,x)  to  denote  L(X,x)  with  Xj^  set 
equal  to  zero,  and  L1(X1,x|X)  to  denote  L(X,x)  considered 
as  a  function  of  X^^  and  x  ( X-j^ ,  *  *  • ,  » ^i+i »  *  ’  ‘  5  arf' 

regarded  as  fixed). 

Initially,  we  set  X$0jJ)  -  0,  J  =  1 , -  *  - , k  and 

^(1,0)  _  ^(0,1)  . . .  jJc(0,k) ) .  Our  algorithm  is  sequentially 

1  9  9  K 

defined  as  follows  (beginning  with  n  =  1,  i  =  0): 

a)  Let  x(n’1+1)  (x(n+ljl)  if  i  =  k)  denote  the 
solution  to 


Minimize  ,x) 

x:fi+l,j s  0  ^ 

(xjf^U)  SO  Vi  if  i-  k) 


This  is  a  convex  programming  problem  which  we  assume  we  can 


solve.  We 


let  jc(n>i+1)  ^(n+1,1)  lf>  i  =  k)  denote  a  Kr 


‘i+1 


vector  for  the  problem.  (Condition  (1.2)  may  prove  useful  in 
obtaining  this  KT  vector.) 


b)  We  now  update  our  global  KT  estimate  by  setting 


.  (n,  1+1 )  _  ,.(n,l)  ...  ,(n,i+l)  .(n-l,i+2)  ...  An- l,k). 


(If  i  =  k,  we  set 


. (n+1,1)  _  ,  (n+1,1)  . (n,2)  ..  . (n,k) 


We  then  replace  (n,i)  by  (n,i+l)  ((n+1,1)  if  i  =  k)  and 

return  to  step  a ) . 


In  many  situations,  the  problems  in  step  a)  which  must  be 
solved  are  always  of  the  same  form,  and  hence  lead  * o  easily 
written  computer  programs  for  performing  these  steps.  We  shall 
give  some  explicit  examples  in  Section  4. 


3.  JUSTIFICATION  FOR  THE  ALGORITHM. 

The  crucial  fact  that  justifies  this  procedure  is  that 
the  Lhgrangian  can  only  increase  at  each  step  in  the  algo¬ 
rithm.  This  follows  since  (for  i  <  k) 

L(X(n,i),x(n,1) )  =  Li(X^n,l),x^n’i) 

s  Li(x[nji)  ,x^n,i  +  1)  X(n,i'1)  (by  Thm .  1.1) 

_  T  fv(n,i)  (n,i+l)  . (n ,  i  +  1 ) , 

"  Li+lUi+l  >x  x  } 

T  /. (n, i+1 )  ( n, i+1 )  .  ( n, i  +  1 ) 

s  Li  +  lkA,i+l  ,x  K 

=  L(X(n-l+1),x(n'1+1)). 


)  (by  Thm.  1.1) 


A  similar  argument  holds  if  i  =  k  for  showing 


LU(n'k,,X(n'k))  s  LU(n+1>1),x(n+l*1>). 

Moreover,  we  note  that  if  y  £  C  is  any  vector  such  that 
f1j(y)  S  0,  Vi,j,  then 

(3.1)  L(X(n,:L),x(n>i))  *  L(X(n,i),y)  £  f Q (y ) , 

so  that  lim  L(X^n’ ^ , x^n’ ^ )  exists  finite  independently 

/  „  j  \ 

of  i.  Now,  since  x^  *  '  minimizes  a  convex  function, 
under  conditions  which  guarantee  sufficient  curvature  of 
L(x,X)  (such  as  xHx  a  yI!xU  for  some  y  >  0  where  H  is 
the  Hessian  of  fQ),  we  know  that  x(n>1+1)  must  be  close 
to  x(n,i^  for  sufficiently  large  n.  Now,  if  x^n,i^  must 
contain  a  convergent  subsequence  (such  as  if  C  is  a  bounded 
region)  converging  to  x^  €  C,  then  we  only  need  continuity 
properties  of  fg  and  f ^ ,  i=l,***,m  to  guarantee  that 

fi(x°)  s  0,  i=  !,••♦, m  and  fQ(x°)  s  fQ(y)  for  all  y  in 

0 

C  which  satisfy  all  constraints  (by  3.1).  Thus  x  must  be 
a  solution  to  our  problem,  and  since  every  convergent  subse¬ 
quence  converges  to  x^,  the  algorithm  must  work  correctly. 


4.  APPLICATIONS. 


1.  Let  us  first  consider  least  squares  problems  ur.ler 
linear  inequality  constraints.  Thus,  we  wish  to  minimise  the 
objective  function 


_  n  2 

fQ(x)  =  Z(gi~  xi  )  wt 


subject  to  the  constraints 


n 


fi(x)  '  x<ai  fifub  5  °>  1  = 


where  w  >  0,  g,a^,***,a  are  given  n  x  1  vectors  such  that 
there  exists  at  least  one  vector  x  where  a'x  s  0  Vi. 

Of  course,  the  solution  to  our  problem  under  a  single 
constraint  a^x  a  0  can  be  easily  found  by  the  expression 


(4.1) 


where 


P1(g) 


r  s 


n 

if  Z  a. ,g,  s  0 
j  =  l  J  -1 
n 

if  Z  a  ,g,  >  0, 

j  =  l  1J  J 


gj  *  Ej-<i1Bxai/)aijwj1/ Jhu"!1 


Z=1 


The  corresponding  KT  value  can  then  be  found  from 


( 1. 2 ) (b# )  as 
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(4.2) 


Xi  ~  (gj  *  Pi<'s) J  ^ wj  aij  ’ 


where  j  is  any  index  such  that  a^.  /  0.  Suppose  now  that 
we  modify  our  objective  function  (with  g  replaced  by  an 
arbitrary  h)  by  adding  X^f^(x).  However,  the  problem 


(4.3) 


1  n  p 

Minimize  £  (h  . -x  .  )  w .  +  X .  f .  ( x ) 
x'a  SO  2  .1  =  1  J  ^  J  11 


has  the  same  solution  as 


1  2  2 


Minimize  -  £  x  w . [h  +  ( P  (g )  -g  )]  +  ±  £  x  w. 

xV*0  .1  =  1  J  J  J  1  J  J  2  .1  =  1  11 


which  is  equivalent  to 

i  n  ■> 

(4.4)  Minimize  p-  £  (h  +  (P  (g)  -g .  )  -  x  . )  w .  . 

x'a  £0  j=l  J  1  J  J  J  J 

r  0 

Thus  our  adjustment  still  leaves  the  problem  as  the  same 
type  of  least  squares  problem,  but  with  the  value  h  modified 
to  now  be  h+  (P^(g)-g). 

We  can  now  apply  the  method  proposed  in  section  2,  which 
can  be  expressed  as  follows: 


1) 

Set 

gll 

-  Vg)> 

and  I11  =  sll"g’ 

2) 

Set 

g12 

=  p2(Sn) 

,  and  I12  =  gl2"g11 

3) 

Continue , 

until  g 

i™  =  and 

Ilm  '  glm_gl,m-l' 


1C 


4) 

Now  set 

g2i 

=  Pl(glm-Ill)’  and 

I21  =  g2l“(f 

-’lrrT1 

5) 

Contine , 

.  In 

general,  set  gn^  = 

"  J  Sn ,  j  - 1  n- 

and  : 

[  .  = 

g  . -( g  .  —I  . ) 

if  j  >  1, 

and 

nj 

nj  &n,j-l  n-l,j' 

snl 

P]_(g 

,  -I  ,  , )  and 
n-l,m  n-1,1 

Pnl  " 

Snl" 

(gn-l,nfIn-l,l) * 

This  scheme  is  easy  to  program  since  the  projections  are 

of  such  simple  form,  and  there  is  no  branching  or  searching 

involved.  Moreover,  g  .  is  guaranteed  to  converge  to  the 

n  J  J 

true  solution.  This  is  a  special  case  of  an  algorithm  given 
by  Dykstra  ( 19 8 3 ) . 

2.  Depending  upon  the  nature  of  the  linear  constraints, 
it  may  be  possible  to  easily  find  the  projection  under  several 
simultaneous  constraints.  For  example,  if 

ff(x)  =  x1-x1+13  i=l,-**,n-l, 

the  set  of  vectors  which  satisfy  all  n-1  constraints  are  just 
the  nondecreasing  vectors.  Projections  onto  these  types  of 
regions  are  quite  tractable  (see  Barlow,  Bartholomew,  Bremncr 
and  Brunk  (1972)).  Another  set  of  linear  inequality  constraint 
which  can  be  simultaneously  handled  are 


fi<J° =  r  ^ d - xi+i-  1  “  1>' ■  ■  ’n~- 


m 


11 


(Shaked  (1979)  and  Dykstra  and  Robertson  (1983)). 

By  being  able  to  handle  large  constraint  sets,  we  can 
improve  the  efficiency  of  our  method. 

To  elaborate,  we  consider  the  problem 


(4.5)  Minimize  hZ(g*-x.)2w, 

x1  , i=l,  •  •  *  ,k  1  'j 

where  A.  =  (a. ,,••*, a.  )  is  an  n  x  m.  matrix  of  inderen- 
i  il5  ’  im.  1 

1 

dent  columns.  We  assume  that  we  can  solve  (4.5)  for  any  par¬ 
ticular  i  and  any  g,  and  will  denote  the  solution  by 
F.(g).  It  can  be  shown  that  the  KT  vector  for  this  problem 
is 


m.  xl  , 

Xi1  =  (AiAi)  (g*w-Pi(g)  -w) 

where  x  >y  denotes  coordinatewise  multiplication. 

Interestingly  enough,  we  can  repeat  the  argument  used  In 
(4.3)  and  (4.4)  to  derive  the  same  type  of  result.  Thus  it  follows 
that  our  earlier  stated  algorithm  is  still  valid  if  P.(g) 
denotes  the  projection  of  g  onto  (x;x'A.  SO). 

This  extension  may  prove  very  useful  for  situations  where 
there  are  a  great  many  constraints.  For  example,  Dykstra  and 
Robertson  (1982)  were  able  to  find  least  squares  projections 
of  rectangular  arrays  under  the  constraints  of  nondecreasing 
rows  and  nondecreasing  columns  for  even  large  arrays. 


3-  As  a  final  example,  we  consider  the  problem  of  finding 
I-proj ections  onto  an  intersection  of  linear  inequality  region 


We  will  not  elaborate  on  the  importance  of  I -project  ions 
only  state  that  they  occur  in  a  myriad  of  places  in  many  of f- 
ferent  settings.  We  refer  the  reader  to  Kullback  (1959)  or 
Csiszar  (1975)  for  elaboration  on  their  importance.  The 
problem  we  are  dealing  with  is 

n 

(4.6)  Minimize  Z  p,  ln(p./r .  ) 

a^psc, ,i=l, • •  •  ,m  i  =  l  1  -1 

fpi=ljPis0 

where  iY0  is  a  finite,  nonnegative,  vector  and  the  ret. 
of  feasible  points  is  not  empty.  We  will  define  our  objective 
function  as 


n  n 

'iLiPi  ln(pi/ri),  P1^0,  £?}=! 

v*  00  ,  elsewhere. 


Our  constraint  functions  are 


f±(p) 


(ai"ci)/  p 


n 

L  (a, ,-c  )p  ,  i =  1, *  *  * ,m, 
j=l  1  J 


where  is  a  vector  of  constants.  We  note  that  r  can  be 

arbitrarily  scaled  without  changing  the  problem. 

The  solution  to  the  problem  of  minimizing  fr)(p)  subject 


providing  it  is  nonnegative,  and  zero  otherwise.  It  also  turn 
cut  that  X1  is  the  KT  value  associated  with  the  problem. 

Now  if  we  modify  our  objective  function  (with  r  replace 
by  an  arbitrary  nonnegative,  nonzero  h)  by  adding  \.fj(x), 
our  problem  becomes 


n 

Minimize  fn(p)  +  X.  Z  (a,  .-c.)p., 
(ar-cr)'p^0  0  1  j  =  l  1 


or 

n  (a ,,-c  ) 

Minimize  Z  p .  [In  p  /h  .  -  In  e  1  lt'  1  1 
(a  -c^J'psO  j=l  J  J 

-1  X 

PJ*°M  pj-i 


or  equivalently. 


n 

(^•7)  Minimize  Z  p.  In  p,/h . (p . ,/r. ) . 

(ar-cr)/ pso  J=1  J  J  J  J  J 

p^o  Vj,2  Pj=i 


The  key  point  is  that  our  problem  is  precisely  of  the 


14 


same  form  as  before,  except  that  we  have  modified  our  vector 
h.  Thus  we  may  use  our  procedure  of  modifying  our  objective 
function  using  updated  estimates  of  the  KT  vector,  and  only 
have  to  solve  the  one  type  of  problem.  Setting  =  0  in 

(4.7)  is  equivalent  to  setting  p  /r  =  1.  This  scheme  is 
quite  effective  for  finding  I-projections  under  multiple- 
linear  inequality  constraints. 

In  summary,  this  procedure  seems  to  work  quite  well  for 
situations  where  many  constraints  are  involved,  and  partial 
solutions  (solutions  under  partial  constraints)  are  easily 


available . 
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